33 research outputs found
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Recent technological advances have proliferated the available computing
power, memory, and speed of modern Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs).
Consequently, the performance and complexity of Artificial Neural Networks
(ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs)
currently offer state-of-the-art performance, they consume large amounts of
power. Training such networks on CPUs is inefficient, as data throughput and
parallel computation is limited. FPGAs are considered a suitable candidate for
performance critical, low power systems, e.g. the Internet of Things (IOT) edge
devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development
environment, networks described using the high-level OpenCL framework can be
accelerated on heterogeneous platforms. Moreover, the resource utilization and
power consumption of DNNs can be further enhanced by utilizing regularization
techniques that binarize network weights. In this paper, we introduce, to the
best of our knowledge, the first FPGA-accelerated stochastically binarized DNN
implementations, and compare them to implementations accelerated using both
GPUs and FPGAs. Our developed networks are trained and benchmarked using the
popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art
performance, while offering a >16-fold improvement in power consumption,
compared to conventional GPU-accelerated networks. Both our FPGA-accelerated
determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10
by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl
Digital, analog, and memristive implementation of Spike-based Synaptic Plasticity
Synaptic platicity is believed to play an essential role in learning and memory in the brain. To date, many plasticity algorithms have been devised, some of which confirmed in electrophysiological experiments. Perhaps the most popular synaptic platicity rule, or learning algorithm, among neuromorphic engineers is the Spike Timing Dependent Plasticity (STDP). The conventional form of STDP has been implemented in various forms by many groups and using different hardware approaches. It has been used for applications such as pattern classification. Hoever, a newer form of STDP, which elicits synaptic efficacy modification based on the timing among a triplet of pre- and post-synaptic spikes, has not been well explored in hardware
Modeling and simulating in-memory memristive deep learning systems: an overview of current efforts
Deep Learning (DL) systems have demonstrated unparalleled performance in many challenging engineering applications. As the complexity of these systems inevitably increase, they require increased processing capabilities and consume larger amounts of power, which are not readily available in resource-constrained processors, such as Internet of Things (IoT) edge devices. Memristive In-Memory Computing (IMC) systems for DL, entitled Memristive Deep Learning Systems (MDLSs), that perform the computation and storage of repetitive operations in the same physical location using emerging memory devices, can be used to augment the performance of traditional DL architectures; massively reducing their power consumption and latency. However, memristive devices, such as Resistive Random-Access Memory (RRAM) and Phase-Change Memory (PCM), are difficult and cost-prohibitive to fabricate in small quantities, and are prone to various device non-idealities that must be accounted for. Consequently, the popularity of simulation frameworks, used to simulate MDLS prior to circuit-level realization, is burgeoning. In this paper, we provide a survey of existing simulation frameworks and related tools used to model large-scale MDLS. Moreover, we perform direct performance comparisons of modernized open-source simulation frameworks, and provide insights into future modeling and simulation strategies and approaches. We hope that this treatise is beneficial to the large computers and electrical engineering community, and can help readers better understand available tools and techniques for MDLS development
Training Progressively Binarizing Deep Networks Using FPGAs
While hardware implementations of inference routines for Binarized Neural
Networks (BNNs) are plentiful, current realizations of efficient BNN hardware
training accelerators, suitable for Internet of Things (IoT) edge devices,
leave much to be desired. Conventional BNN hardware training accelerators
perform forward and backward propagations with parameters adopting binary
representations, and optimization using parameters adopting floating or
fixed-point real-valued representations--requiring two distinct sets of network
parameters. In this paper, we propose a hardware-friendly training method that,
contrary to conventional methods, progressively binarizes a singular set of
fixed-point network parameters, yielding notable reductions in power and
resource utilizations. We use the Intel FPGA SDK for OpenCL development
environment to train our progressively binarizing DNNs on an OpenVINO FPGA. We
benchmark our training approach on both GPUs and FPGAs using CIFAR-10 and
compare it to conventional BNNs.Comment: Accepted at 2020 IEEE International Symposium on Circuits and Systems
(ISCAS
Variation-aware binarized memristive networks
The quantization of weights to binary states in Deep Neural Networks (DNNs) can replace resource-hungry multiply accumulate operations with simple accumulations. Such Binarized Neural Networks (BNNs) exhibit greatly reduced resource and power requirements. In addition, memristors have been shown as promising synaptic weight elements in DNNs. In this paper, we propose and simulate novel Binarized Memristive Convolutional Neural Network (BMCNN) architectures employing hybrid weight and parameter representations. We train the proposed architectures offline and then map the trained parameters to our binarized memristive devices for inference. To take into account the variations in memristive devices, and to study their effect on the performance, we introduce variations in R ON and R OFF . Moreover, we introduce means to mitigate the adverse effect of memristive variations in our proposed networks. Finally, we benchmark our BMCNNs and variation-aware BMCNNs using the MNIST dataset
MemTorch: An Open-source Simulation Framework for Memristive Deep Learning Systems
Memristive devices have shown great promise to facilitate the acceleration
and improve the power efficiency of Deep Learning (DL) systems. Crossbar
architectures constructed using memristive devices can be used to efficiently
implement various in-memory computing operations, such as Multiply-Accumulate
(MAC) and unrolled-convolutions, which are used extensively in Deep Neural
Networks (DNNs) and Convolutional Neural Networks (CNNs). Currently, there is a
lack of a modernized, open source and general high-level simulation platform
that can fully integrate any behavioral or experimental memristive device model
and its putative non-idealities into crossbar architectures within DL systems.
This paper presents such a framework, entitled MemTorch, which adopts a
modernized software engineering methodology and integrates directly with the
well-known PyTorch Machine Learning (ML) library. We fully detail the public
release of MemTorch and its release management, and use it to perform novel
simulations of memristive DL systems, which are trained and benchmarked using
the CIFAR-10 dataset. Moreover, we present a case study, in which MemTorch is
used to simulate a near-sensor in-memory computing system for seizure detection
using Pt/Hf/Ti Resistive Random Access Memory (ReRAM) devices. Our open source
MemTorch framework can be used and expanded upon by circuit and system
designers to conveniently perform customized large-scale memristive DL
simulations taking into account various unavoidable device non-idealities, as a
preliminary step before circuit-level realization.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems. Update: Fixed accent \'e characte
Memristive Stochastic Computing for Deep Learning Parameter Optimization
Stochastic Computing (SC) is a computing paradigm that allows for the
low-cost and low-power computation of various arithmetic operations using
stochastic bit streams and digital logic. In contrast to conventional
representation schemes used within the binary domain, the sequence of bit
streams in the stochastic domain is inconsequential, and computation is usually
non-deterministic. In this brief, we exploit the stochasticity during switching
of probabilistic Conductive Bridging RAM (CBRAM) devices to efficiently
generate stochastic bit streams in order to perform Deep Learning (DL)
parameter optimization, reducing the size of Multiply and Accumulate (MAC)
units by 5 orders of magnitude. We demonstrate that in using a 40-nm
Complementary Metal Oxide Semiconductor (CMOS) process our scalable
architecture occupies 1.55mm and consumes approximately 167W when
optimizing parameters of a Convolutional Neural Network (CNN) while it is being
trained for a character recognition task, observing no notable reduction in
accuracy post-training.Comment: Accepted by IEEE Transactions on Circuits and Systems Part II:
Express Brief
Simulation of memristive crossbar arrays for seizure detection and prediction using parallel Convolutional Neural Networks [Formula presented]
For epileptic seizure detection and prediction, to address the computational bottleneck of the von Neumann architecture, we develop an in-memory memristive crossbar-based accelerator simulator. The simulator software is composed of a Python-based neural network training component and a MATLAB-based memristive crossbar array component. The software provides a baseline network for developing deep learning-based signal processing tasks, as well as a platform to investigate the impact of weight mapping schemes and device and peripheral circuitry non-idealities
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic
processors, new opportunities are emerging for applying deep and Spiking Neural
Network (SNN) algorithms to healthcare and biomedical applications at the edge.
This can facilitate the advancement of the medical Internet of Things (IoT)
systems and Point of Care (PoC) devices. In this paper, we provide a tutorial
describing how various technologies ranging from emerging memristive devices,
to established Field Programmable Gate Arrays (FPGAs), and mature Complementary
Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL
accelerators to solve a wide variety of diagnostic, pattern recognition, and
signal processing problems in healthcare. Furthermore, we explore how spiking
neuromorphic processors can complement their DL counterparts for processing
biomedical signals. After providing the required background, we unify the
sparsely distributed research on neural network and neuromorphic hardware
implementations as applied to the healthcare domain. In addition, we benchmark
various hardware platforms by performing a biomedical electromyography (EMG)
signal processing task and drawing comparisons among them in terms of inference
delay and energy. Finally, we provide our analysis of the field and share a
perspective on the advantages, disadvantages, challenges, and opportunities
that different accelerators and neuromorphic processors introduce to healthcare
and biomedical domains. This paper can serve a large audience, ranging from
nanoelectronics researchers, to biomedical and healthcare practitioners in
grasping the fundamental interplay between hardware, algorithms, and clinical
adoption of these tools, as we shed light on the future of deep networks and
spiking neuromorphic processing systems as proponents for driving biomedical
circuits and systems forward.Comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (21
pages, 10 figures, 5 tables